WE CLAIM : 

1. A method of setting up a DLSI space-based classifier for document 
classification comprising the steps of: 

preprocessing documents to distinguish terms of a word and a noun phrase from 
stop words; 

constructing system terms by setting up a term list as well as global weights; 
normalizing document vectors of collected documents, as well as centroid vectors 
of each cluster; 

constructing a differential term by intra-document matrix D™*"' , such that each 
column in said matrix is a differential intra-document vector; 

decomposing the differential term by intra-document matrix D } , by an SVD 
algorithm, into D 2 = UjStfiS; = diag(S Iil ,S It29 ---)) , followed by a composition of 
Dj k = U k S k yl giving an approximate D 7 in terms of an appropriate £ 7 ; 

setting up a likelihood function of intra-differential document vector; 

constructing a term by extra-document matrix D E X " E , such that each column of 
said extra-document matrix is an extra-differential document vector; 

decomposing D E , by exploiting the SVD algorithm, into 
D E =U E S E V E r (S E = diag(S E1 ,S E ^--)) , then with a proper k E , defining 

D E,k E to approximate D E ; 

setting up a likelihood function of extra-differential document vector; 
setting up a posteriori function; and 

using the DLSI space-based classifier to automatically classify a document. 
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2, An automatic document classification method using a DLSI space-based 
classifier for classifying a document in accordance with clusters in a database, comprising 
the steps of : 

a) setting up a document vector by generating terms as well as frequencies of 
occurrence of said terms in the document, so that a normalized document vector N is 
obtained for the document; 

b) constructing, using the document to be classified, a differential document 
vector x = N-C, where C is the normalized vector giving a center or centroid of a 
cluster; 

c) calculating an intra-document likelihood function P(x\Dj) for the document; 

d) calculating an extra-document likelihood function P(x\D E ) for the 



e) calculating a Bayesian posteriori probability function P(D } \ x) ; 

f) repeating, for each of the clusters of the data base, steps b-e; 

g) selecting a cluster having a largest P(D 7 1 x) as the cluster to which the 
document most likely belongs; and 

h) classifying the document in the selected cluster. 

3 . The method as set forth in claim 2, wherein the normalized document vector N 



4. A method of setting up a DLSI space-based classifier for document 
classification, comprising the steps of: 



document; 



is obtained using an equation, b l} = al J£^ • 
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setting up a differential term by intra-document matrix where each column 
of the matrix denotes a difference between a document and a centroid of a cluster to 
which the document belongs; 

decomposing the differential term by intra-document matrix by an SVD 
algorithm to identify an intra-DLSI space; 

setting up a probability function for a differential document vector being a 
differential intra-document vector; 

calculating the probability function according to projection and distance 
from the differential document vector to the intra-DLSI space; 

setting up a differential term by extra-document matrix where each 
column of the matrix denotes a differential document vector between a document vector 
and a centroid vector of a cluster which does not include the document; 

decomposing the differential term by extra-document matrix by an SVD 
algorithm to identify an extra-DLSI space; 

setting up a probability function for a differential document vector being a 
differential extra-document vector; 

setting up a posteriori likelihood function using the differential 
intra-document and differential extra-document vectors to provide a most probable 
similarity measure of a document belonging to a cluster; and 

using the DLSI space-based classifier to automatically classify a 

document. 
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5. The method as set forth in claim 4, wherein the step of setting up a probability 
function for a differential document vector being a differential intra-document vector is 
performed using an equation, 



n) l2 exp 



2^81) \ 2 Pl _ 



P(x|A) = 

(2^r /2 n^ i ,-/7 (r '- t ' )/2 



where j> = E/[jc, * 2 (*HI*I| 2 T" 2X*' andr ' istherankof 

matrix D } , 



6. The method as set forth in claim 4, wherein the step of setting up a probability 
function for a differential document vector being a differential extra-document vector is 
performed using an equation, 

i=i 

where y = C/[x, * 2 (x)=ll*ll 2 > />£ = ~ IX' ' r * is the rank of 

matrix . 

7. The method as set forth in claim 4, wherein the step of setting up a posteriori 
likelihood function is performed using an equation, 

P{x\D J )P{D I ) 



P(AI*) = 



P{x\D I )P{D 1 ) + P{x\D E )P{D E ) 
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where P{D } ) is set to \ln c where n c is the number of clusters in the database 
and P(D S ) is set to 1 - P{D } ) . 

8. The method as set forth in claim 4, the step of using the DLSI space-based 
classifier to automatically classify a document comprising the steps of: 

a) setting up a document vector by generating terms as well as frequencies of 
occurrence of said terms in the document, so that a normalized document vector N is 
obtained for the document; 

b) constructing, using the document to be classified, a differential document 
vector x = N-C 9 where C is the normalized vector giving a center or centroid of a 
cluster; 

c) calculating an intra-document likelihood function P(x | D } ) for the document; 

d) calculating an extra-document likelihood function P(x \ D E ) for the 
document; 

e) calculating a Bayesian posteriori probability function P(D T | x) ; 

f) repeating, for each of the clusters of the data base, steps b-e; 

g) selecting a cluster having a largest P(D } \ x) as the cluster to which the 

document most likely belongs; and 

h) classifying the document in the selected cluster. 

9. The method as set forth in claim 8, wherein the normalized document 



vector N is obtained using an equation, b l} - a l} j J Z ^ * 
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